In this part, two maps were produced with QGIS and RStudio seperately. They are displayed here with brief descriptions about the procedures of making the maps, as well as a critical comparison of the tools used to produce them.
This a map showing the the percentage of people not born in London of each Borough with deeper blue indicating higher percentage.
Thematic map made with QGIS
The data was fetched from UK Data Services, with seperate london boundary shapefile and csv file of census data on population structure. To start making the map, add these two layers to QGIS through “Vector” tab and “Delimited Text” tab seperately in “Data Source Manager” accessed from the “Layer” menu.
add layers
Before joining two layers, check the attribute table of shapefile and the csv file to make sure there is a common field that can be set as the “join field”. Then the csv data was joined to the boundary shapefile by going to the “join” tab in the “properties” window of the boundary layer.
join layers
Next, display the data as desired through adjusting features under “Symbology” and “Label” tabs in the “properties” window of the boundary layer.
adjust symbologies
adjust labels
Base map was added through “XYZ Tiles”. To load the base map options, run the python script get from the tutorial online in the “Python Console” accessed from “Plugins” menu.
add basemap
Lastly, the layout was generated using “Layout Manager”.
generate layout
As shown in the “Data Source Manager”, QGIS is capable of compiling various types of data and adding them via different pathes. This makes it convenient to work together with all kinds of open sourse databases. Besides, various plugins including python console are supported, which gives it possibility to be conneted with resources and functions of other platforms.
While using QGIS to display the data, the result of visualization can be reflected immediately and adjusted accordingly to best convey the information. The color, style of lines, and style of text are directly demonstrated before being chosen to be applied to the map. Besides, it is intuitive to generate layout in a GUI-based tool because there is nested functions of adding different cartographical elements and they can be easily adjusted and moved around for the desired consequence.
Below are codes demonstrating the steps of building up an interactive map of the same theme and similar rendering style using the same data files and made with the tmap package in RStudio.
library(tidyverse)
## -- Attaching packages -------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ----------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(sp)
library(rgdal)
## rgdal: version: 1.3-6, (SVN revision 773)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
## Path to GDAL shared files: C:/Users/alexy/Documents/R/win-library/3.5/rgdal/gdal
## GDAL binary built with GEOS: TRUE
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: C:/Users/alexy/Documents/R/win-library/3.5/rgdal/proj
## Linking to sp version: 1.3-1
library(shinyjs)
##
## Attaching package: 'shinyjs'
## The following object is masked from 'package:sp':
##
## show
## The following objects are masked from 'package:methods':
##
## removeClass, show
library(htmltools)
library(RColorBrewer)
library(tmaptools)
library(tmap)
BoroughBd <- readOGR("Part1/shapefiles/england_lad_2011.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\alexy\Desktop\CASA0005GISAssessment\Part1\shapefiles\england_lad_2011.shp", layer: "england_lad_2011"
## with 33 features
## It has 4 fields
LondonData <- read.csv("Part1/LondonData.csv")
LondonData <- data.frame(LondonData)
LondonBoroughs <- LondonData[grep("^E09",LondonData[,3]),] # select rows of London Boroughs
LondonBoroughs <- LondonBoroughs[,c(3,16)] # select needed columns
LondonBoroughs <- LondonBoroughs[2:34,] # get rid of duplicated column
BoroughBd@data <- data.frame(BoroughBd@data,LondonBoroughs[match(BoroughBd@data[,"code"],LondonBoroughs[,"code"]),]) # join the attribute data to the SP data
names(BoroughBd)[3] <- c("Borough Name") # rename the column
## Warning in checkNames(value): attempt to set invalid names: this may lead
## to problems later on. See ?make.names
names(BoroughBd)[6] <- c("Percentage")
## Warning in checkNames(value): attempt to set invalid names: this may lead
## to problems later on. See ?make.names
BoroughBd <- BoroughBd[c(3,6)] # extract the two neccessary columns
Borough_repro <-spTransform(BoroughBd, CRS("+proj=longlat +datum=WGS84")) #reproject the data
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(Borough_repro) +
tm_polygons("Percentage",
style="jenks",
palette=get_brewer_pal("Blues", n = 5,contrast = c(0,1)),
border.col = "white",
midpoint=NA,
popup.vars="Percentage",
title="% of people not born in London")
## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3
Compared to QGIS, R is more handy for data maniputaling especially for attribute data. It is easy to clean and slice neccessary data with built-in functions in R. Besides, it is much easier to manage the project with R when it has to be revised frequently or involves group work becaused the connection with Git makes it possible to perform version control and process the work at multiple workspaces.
As for visualization, in terms of making simple thematic map, although R is capable of making interactive maps with certain packages, it seems not to be inferior to a GUI-based tool since those rendering features are all subject to customization through coding. It takes much more efforts to get to know and become familier with the tools and functions available in R since they are not directly shown anywhere. Besides, to realized the desired visualization effect, which involves in lots of detail tweaking, cartographers will need to refer to the documentations and type the code all the time. It also takes extra time to debugging and figuring out results that each line of code will lead to.
This part aims to solve six given questions about certain spatial features and relationships by choosing suitable analytical tools and methods. The process of the solution design and analysis will also be discussed.
To answer the questions, approaches including geometry calculation, buffer generation, location query, attribute manipulation, statistic summarization, and spatial clustering will be needed. These approaches can all be conducted with built-in geo-processing tools in ArcGIS, which can be easily found throughout the interface of the software and are intuitive to use. Thus, ArcGIS will be the main tool utilized to solve the problems. To perform these analysis with tools in ArcGIS, the existing data and information in the form of KML and CSV will need to be converted to Feature Class and the attribute table joined to features, which can be also done by conversion tools in the software.
General Workflow
The detailed processes and tools used to solve each question is explained with the workflow diagram below.
Detailed workflow
The first five questions essentially are all asking for a statistic summary of certain numerical attribute of a group of spatial features. Therefore, the answers were easily approached through the attribute tables of the feature class or shapefiles converted from the original data. However, when accomplishing this process in ArcGIS, some steps, such as conversion and reprojection, had to be repeated multiple times for different layers, which were kind of time-consuming and annoying. In this case, code-based environment might actually be more efficient to process the data. As for the last question, which is about analysis of point spatial pattern, pertinent analytical model is required to give an answer. Between the two available commonly used models for analyzing point spatial pattern, the Quadrat Analysis and Ripley’s K test, Ripley’s K is more reliable without the influence of scale and shape of observation window, so Ripley’K test was performed. While both ArcGIS and R has the tool or function to run the test, R was used in this case because certain error kept occurring with ArcGIS and failed to be solved.
Answer to Q1
As shown above, the total length of lines is calculated to be 64148.978657 metres. That is to say, the distance travelled is about 64 km. However, the accuracy of this result is impacted by the accuracy of the raw timeline KML data from GoogleMap, because instead of recording the exact route of movement, the timeline function can only form a gerneral route by drawing straight lines between the spots that has been visited. Besides, the process of reprojecting the data to BNG could also have caused some error for calculating the length of lines.
Answer to Q2
By selecting stations within the buffer layer, 38 stations are selected. Thus, the route passes 38 stations within 100 metres.
Answer to Q3
By summing up the numbers in the “points” field of selected entries, the total points scored is 26. When approaching this answer, the GoogleMap API and a chunk of python code (see appendix 1) is used to geocode the treasure hunt locations. This process of geocoding could have led to some error when mapping the locations.
Answer to Q4(a)
Answer to Q4(b)
In the attribute table of the layer of wards passed through, by sorting the field of male life expectancy, it can be seen that the first entry is Westbourne, which has the lowest rate, and the last is Knightsbridge and Belgravia, which has the highest rate.
Answer to Q5
Again, by looking at the statistics of the attribute table of the related layer, the average life expectancy for male is 79.63 and for female is 84.52.
The Ripley’s K test was perfomed using the kest function in the spatstat package in R. The code used is shown as below.
#load the necessary libraries
library(spatstat)
## Loading required package: spatstat.data
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## Loading required package: rpart
##
## spatstat 1.57-1 (nickname: 'Cartoon Physics')
## For an introduction to spatstat, type 'beginner'
library(sp)
library(rgeos)
## rgeos version: 0.4-1, (SVN revision 579)
## GEOS runtime version: 3.6.1-CAPI-1.10.1
## Linking to sp version: 1.3-1
## Polygon checking: TRUE
library(maptools)
## Checking rgeos availability: TRUE
library(GISTools)
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:spatstat':
##
## area
## The following object is masked from 'package:dplyr':
##
## select
library(tmap)
library(sf)
library(geojsonio)
##
## Attaching package: 'geojsonio'
## The following object is masked from 'package:base':
##
## pretty
library(tmaptools)
library(rgdal)
#read the data of London wards and treasure hunt locations
TreasHuntPoint <- readOGR("Part2/shapefiles/TreasHuntPoints.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\alexy\Desktop\CASA0005GISAssessment\Part2\shapefiles\TreasHuntPoints.shp", layer: "TreasHuntPoints"
## with 50 features
## It has 6 fields
## Integer64 fields read as strings: OBJECTID points
LondonWards <- readOGR("Part2/shapefiles/LondonData_Joined.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\alexy\Desktop\CASA0005GISAssessment\Part2\shapefiles\LondonData_Joined.shp", layer: "LondonData_Joined"
## with 625 features
## It has 74 fields
## Integer64 fields read as strings: OBJECTID Population Children_a Working_ag Older_peop Median_Age Number_Kil In_employm Number_of_ Median_Hou Number_of1 Median_H_1 Number_o_1 percent_Fl ID2010_Ran ID2010_per
#run a point pattern analysis with ripley's K
window <- as.owin(LondonWards)
TreasHunt.ppp <- ppp(x=TreasHuntPoint@coords[,1],y=TreasHuntPoint@coords[,2],window=window)
K <- Kest(TreasHunt.ppp, correction="border")
plot(K)
The plot shows that the calculated K value is above the expected value almost all the time, indicating that the treasure hunt points are clustred in London.